-
Notifications
You must be signed in to change notification settings - Fork 19
feat: add commit message caching system to reduce API costs #118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ve performance - Implement diff-based caching with SHA256 hashing - Add cache management CLI commands (stats, clear, cleanup) - Support all LLM providers with provider-aware caching - Include cost tracking and hit rate statistics - Add comprehensive test coverage (7 test functions) - Maintain backward compatibility with existing functionality"
WalkthroughAdds a persistent, thread-safe commit message cache (manager, hasher, types, tests), integrates cache into StoreMethods and the message generation flow, and exposes CLI commands to view stats, clear, and cleanup the cache. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant Store
participant Cache
participant LLM as LLM Provider
participant Disk
User->>CLI: request commit message
CLI->>Store: GetCachedMessage(provider,diff,opts)
Store->>Cache: Get(key)
alt Cache Hit
Cache-->>Store: cached entry
Store-->>CLI: cached message
else Cache Miss
Cache-->>Store: miss
CLI->>LLM: generate message
LLM-->>CLI: message + cost/tokens
CLI->>Store: SetCachedMessage(provider,diff,opts,message,cost,tokens)
Store->>Cache: Set(entry)
Cache->>Disk: persist JSON
Disk-->>Cache: saved
Store-->>CLI: message
end
CLI-->>User: display message
sequenceDiagram
participant User
participant CLI as Cache CLI
participant Store
participant Cache
participant Disk
User->>CLI: cache stats
CLI->>Store: GetCacheStats()
Store->>Cache: GetStats()
Cache-->>Store: CacheStats
Store-->>CLI: CacheStats
CLI-->>User: display stats
User->>CLI: cache clear
CLI->>User: prompt confirm
alt confirmed
CLI->>Store: ClearCache()
Store->>Cache: Clear()
Cache->>Disk: remove cache file
Cache-->>Store: cleared
Store-->>CLI: success
CLI-->>User: cleared summary
else cancelled
CLI-->>User: abort
end
User->>CLI: cache cleanup
CLI->>Store: CleanupCache()
Store->>Cache: Cleanup()
Cache->>Disk: persist cleaned JSON
Cache-->>Store: removed count
Store-->>CLI: report
CLI-->>User: cleaned summary
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used🧬 Code graph analysis (2)internal/cache/cache.go (4)
cmd/cli/store/store.go (4)
🔇 Additional comments (9)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Nitpick comments (8)
internal/cache/hasher_test.go (1)
10-85: Good test coverage for basic hash generation.The test validates consistency, non-empty output, and correct SHA-256 hex length (64 chars). Consider adding edge case tests for empty diffs or very large diffs if relevant to your use case.
cmd/cli/createMsg.go (1)
319-319: Consider capturing actual token usage when available.The code uses estimated token counts for caching cost. If provider responses include actual token usage (like the
UsageInfofield inCacheEntry), consider capturing and using those values for more accurate cost tracking.internal/cache/hasher.go (2)
35-58: Review the line sorting strategy for semantic correctness.The normalization sorts all diff lines (line 55) to ensure consistent hashing. While this achieves determinism, it may cause cache hits for semantically different diffs that happen to contain the same set of changes in different orders or contexts.
For example:
- Diff A: adds line X in file1, adds line Y in file2
- Diff B: adds line Y in file1, adds line X in file2
After sorting, both might produce the same normalized diff, leading to a cache hit even though the changes are in different files.
Consider whether this trade-off is acceptable for your use case. If context matters, you might want to preserve hunk structure or include file paths in the normalized output.
85-104: Minor redundancy in whitespace trimming.Lines 88 and 98 both trim whitespace, which is redundant but harmless. Consider simplifying:
func (h *DiffHasher) normalizeLine(line string) string { - // Remove leading/trailing whitespace - line = strings.TrimSpace(line) - // If it's a diff line (+ or -), normalize the content if strings.HasPrefix(line, "+") || strings.HasPrefix(line, "-") { // Keep the + or - prefix but normalize the rest prefix := line[:1] content := line[1:] - - // Remove line numbers and other variable content - // This is a simple approach - in practice, you might want more sophisticated normalization content = strings.TrimSpace(content) - return prefix + content } - return line + return strings.TrimSpace(line) }cmd/cli/cache.go (2)
11-61: Minor redundancy in hit rate display.Lines 27 and 51-52 both display the hit rate: once in the stats table and again in the status message. Consider whether both are necessary, or if the table is sufficient.
// Line 27 (in table) {"Hit Rate", fmt.Sprintf("%.2f%%", stats.HitRate*100)}, // Lines 51-52 (in status message) pterm.Info.Printf("Cache hit rate: %.2f%% (%.0f%% of requests served from cache)\n", stats.HitRate*100, stats.HitRate*100)Otherwise, the function provides good user feedback with clear statistics.
63-104: Clarify cost savings message.Line 100-101 states "saved $%.4f in future API costs", but
stats.TotalCostSavedrepresents historical savings from cache hits, not future savings. Consider rephrasing:- pterm.Info.Printf("Removed %d entries and saved $%.4f in future API costs\n", - stats.TotalEntries, stats.TotalCostSaved) + pterm.Info.Printf("Removed %d entries. Previous cache hits saved $%.4f\n", + stats.TotalEntries, stats.TotalCostSaved)Otherwise, the confirmation flow and messaging are excellent for preventing accidental data loss.
internal/cache/cache.go (2)
295-332: Cost-savings metric is off: track savings from hits, not sum of entry costs.TotalCostSaved should reflect avoided calls: sum over (AccessCount-1) * Cost.
Apply:
- var totalCost float64 + var saved float64 @@ - totalCost += entry.Cost + if entry.AccessCount > 1 && entry.Cost > 0 { + saved += float64(entry.AccessCount-1) * entry.Cost + } @@ - cm.stats.TotalCostSaved = totalCost + cm.stats.TotalCostSaved = savedNote: This assumes Cost is per-generation cost. If Cost is per-token, adjust accordingly.
28-34: Honor config.Enabled (and consider CleanupInterval).Enabled and CleanupInterval are unused. Either remove them or enforce behavior.
Example minimal checks:
func (cm *CacheManager) Get(provider types.LLMProvider, diff string, opts *types.GenerationOptions) (*types.CacheEntry, bool) { - cm.mutex.RLock() - defer cm.mutex.RUnlock() + if !cm.config.Enabled { + return nil, false + } + cm.mutex.Lock() + defer cm.mutex.Unlock()func (cm *CacheManager) Set(provider types.LLMProvider, diff string, opts *types.GenerationOptions, message string, cost float64, tokens *types.UsageInfo) error { - cm.mutex.Lock() + if !cm.config.Enabled { + return nil + } + cm.mutex.Lock() defer cm.mutex.Unlock()Optional: wire CleanupInterval via a background ticker started in NewCacheManager() and stopped via a Close() method.
Also applies to: 60-66, 86-92
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
cmd/cli/cache.go(1 hunks)cmd/cli/createMsg.go(3 hunks)cmd/cli/root.go(3 hunks)cmd/cli/store/store.go(2 hunks)internal/cache/cache.go(1 hunks)internal/cache/cache_test.go(1 hunks)internal/cache/hasher.go(1 hunks)internal/cache/hasher_test.go(1 hunks)pkg/types/types.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
internal/cache/hasher.go (3)
pkg/types/options.go (1)
GenerationOptions(4-10)cmd/cli/store/store.go (1)
LLMProvider(55-58)pkg/types/types.go (1)
LLMProvider(5-5)
pkg/types/types.go (2)
internal/llm/provider.go (1)
Provider(24-29)cmd/cli/store/store.go (1)
LLMProvider(55-58)
cmd/cli/store/store.go (4)
internal/cache/cache.go (2)
CacheManager(17-24)NewCacheManager(27-58)pkg/types/types.go (5)
Config(60-63)LLMProvider(5-5)CacheEntry(111-122)UsageInfo(104-108)CacheStats(125-134)pkg/types/options.go (1)
GenerationOptions(4-10)cmd/cli/cache.go (2)
ClearCache(64-104)CleanupCache(107-142)
internal/cache/hasher_test.go (3)
internal/cache/hasher.go (1)
NewDiffHasher(17-19)pkg/types/options.go (1)
GenerationOptions(4-10)pkg/types/types.go (2)
ProviderOpenAI(8-8)ProviderClaude(9-9)
cmd/cli/cache.go (2)
cmd/cli/root.go (1)
Store(11-11)cmd/cli/store/store.go (1)
StoreMethods(18-21)
cmd/cli/root.go (2)
cmd/cli/store/store.go (1)
StoreMethods(18-21)cmd/cli/cache.go (3)
ShowCacheStats(12-61)ClearCache(64-104)CleanupCache(107-142)
internal/cache/cache_test.go (4)
internal/cache/cache.go (1)
CacheManager(17-24)pkg/types/types.go (6)
CacheConfig(137-143)CacheEntry(111-122)CacheStats(125-134)ProviderOpenAI(8-8)Message(80-83)ProviderClaude(9-9)internal/cache/hasher.go (1)
NewDiffHasher(17-19)pkg/types/options.go (1)
GenerationOptions(4-10)
internal/cache/cache.go (4)
pkg/types/types.go (7)
CacheConfig(137-143)CacheEntry(111-122)CacheStats(125-134)LLMProvider(5-5)UsageInfo(104-108)Message(80-83)Config(60-63)internal/cache/hasher.go (2)
DiffHasher(14-14)NewDiffHasher(17-19)pkg/types/options.go (1)
GenerationOptions(4-10)utils/storeUtils.go (1)
GetConfigPath(30-68)
cmd/cli/createMsg.go (5)
internal/llm/provider.go (1)
Provider(24-29)cmd/cli/store/store.go (2)
StoreMethods(18-21)LLMProvider(55-58)pkg/types/types.go (1)
LLMProvider(5-5)pkg/types/options.go (1)
GenerationOptions(4-10)pkg/types/prompt.go (1)
BuildCommitPrompt(30-51)
🔇 Additional comments (23)
cmd/cli/root.go (2)
68-96: LGTM! Cache command structure is well-organized.The new cache command group follows the existing CLI patterns (similar to
llmCmd) with proper error handling viaRunE. The command hierarchy is clear:cacheparent withstats,clear, andcleanupsubcommands.
134-139: LGTM! Command wiring is correct.The cache commands are properly registered with the root command, following the same initialization pattern as existing commands.
pkg/types/types.go (3)
110-122: LGTM! CacheEntry structure is well-designed.The struct captures all necessary metadata for cached messages. Using string timestamps (RFC3339 format based on usage in cache.go) is appropriate for JSON serialization. The optional
Tokenspointer andomitemptytags on optional fields ensure clean JSON output.
124-134: LGTM! CacheStats provides comprehensive metrics.The structure includes all relevant statistics for cache performance monitoring: hit rate, cost savings, and temporal metadata. Field types are appropriate for their purposes.
136-143: LGTM! CacheConfig provides sensible configuration options.The configuration struct covers cache behavior control (enabled flag), capacity limits (MaxEntries), time-based cleanup (MaxAgeDays, CleanupInterval), and persistence (CacheFilePath). The field types and granularity are appropriate.
internal/cache/hasher_test.go (3)
87-114: LGTM! Consistency test is crucial for cache correctness.This test validates deterministic hash generation, which is essential for cache lookups to work correctly. The test properly covers multiple invocations with identical inputs.
116-161: LGTM! Differentiation test prevents cache collisions.This test ensures that different inputs produce different hashes, which is critical for avoiding cache collisions and returning incorrect cached messages.
163-203: LGTM! Cache key generation properly incorporates provider context.The test validates that cache keys are provider-specific (preventing cross-provider cache hits) while remaining consistent for the same provider. The provider name embedding check is a nice touch for debuggability.
internal/cache/cache_test.go (5)
10-68: LGTM! Basic cache operations are well-tested.The test covers the happy path (Set/Get with matching provider) and a negative case (Get with different provider). Field validation ensures cached data integrity.
70-140: LGTM! Statistics tracking is thoroughly tested.The test validates all stat counters (entries, hits, misses) and the derived hit rate calculation. This ensures the cache metrics displayed to users will be accurate.
142-191: LGTM! Cache clearing is properly tested.The test validates that Clear() removes entries and resets statistics, ensuring users can properly manage cache storage.
193-253: LGTM! Persistence test ensures cache durability.This test validates that cache entries survive across CacheManager instances, which is essential for the cache to provide value across multiple tool invocations. The test properly exercises serialization and deserialization.
255-295: LGTM! Normalization test ensures stable cache keys.This test validates that diff normalization removes variable content (headers, paths, timestamps) while preserving the actual changes. This is critical for cache hits to work correctly across different repositories and time periods.
cmd/cli/createMsg.go (2)
122-122: LGTM! Call sites properly updated for caching.Both initial generation and regeneration now use the cache-aware wrapper, passing the necessary Store and provider type for cache lookups and writes.
Also applies to: 185-185
300-329: LGTM! Cache integration logic is sound.The function correctly:
- Checks cache before calling the provider (lines 303-308)
- Only caches first attempts, avoiding regeneration conflicts (lines 303, 317)
- Makes cache write failures non-fatal (line 322-324), ensuring generation still succeeds
- Displays cost savings to users (line 305)
The design choice to cache only initial attempts (not regenerations) is appropriate, as regenerations are requested when users want a different message.
cmd/cli/store/store.go (1)
386-416: LGTM! Cache methods provide clean delegation.The cache management methods properly encapsulate the CacheManager, providing a clean API for CLI commands. The delegation pattern keeps the implementation simple and maintainable.
internal/cache/hasher.go (4)
13-19: LGTM! Simple, stateless hasher design.The DiffHasher is appropriately stateless, making it safe for concurrent use across goroutines.
21-33: LGTM! Hash generation is straightforward and correct.Uses SHA-256 for cryptographic-strength hashing and proper hex encoding. The normalization and input building are delegated to separate functions for clarity.
106-126: LGTM! Hash input building correctly incorporates options.The function properly combines diff and options with a separator. The logic to include
attempt:1only for first attempts aligns with the design to cache only initial generations (not regenerations).
128-132: LGTM! Cache key format properly isolates providers.The key format (
provider:hash) ensures cache entries are provider-specific, preventing cross-provider cache hits which could return messages in an unexpected style.cmd/cli/cache.go (3)
106-142: LGTM! Cleanup command provides clear feedback.The function properly tracks before/after state and provides informative messages about cleanup results. Error handling is appropriate.
146-158: LGTM! Standard byte formatting implementation.The function correctly formats bytes into human-readable units using the conventional 1024-byte base (KiB, MiB, etc.).
160-167: LGTM! Time formatting handles errors gracefully.The function properly parses RFC3339 timestamps and formats them in a readable way. The fallback to the original string on parse errors prevents crashes from malformed cache data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @Laaaaksh can u check the coderabbit suggestions
|
also make sure to update the readme |
|
@DFanso Sorry got occupied. Taking this ahead now. |
|
@DFanso Please check. |
|
Sure @Laaaaksh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved 🎉
Description
Type of Change
Related Issue
Fixes #87
Changes Made
Testing
Checklist
Screenshots (if applicable)
Additional Notes
For Hacktoberfest Participants
Thank you for your contribution! 🎉
Summary by CodeRabbit
New Features
Tests
Documentation